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Introduction. The study seeks to answer two questions: 
How do university students learn to use correct strategies to 
conduct scholarly information searches without instructions? 
and, What are the differences in learning mechanisms 
between users at different cognitive levels? 

Method. Two groups of users, thirteen first year 
undergraduate students (freshmen) and thirty-four final 
year undergraduate students (seniors), were recruited into 
our experimental study and executed ten different search 
tasks independently. Five reinforcement learning models 
were introduced to quantitatively simulate the micro process 
of users' self-regulated learning of search expertise by trial 
and error. 

Analysis. The experimental data were divided into two 
parts. The first 70% of the data was used to estimate the 
parameters of each model. The remaining 30% was fitted by 
the estimated models. The model best fitting the data of users 
in each group was used to explain their learning behaviour. 
Results. Most undergraduates tended to repeat the 











strategies that brought success in their earlier experiences. 
Freshmen's learning behaviour manifested remarkable 
Markov properties. Their strategy selection was always 
made according to the feedback obtained in the last search 
activity. Seniors' strategy adjustment depended on the 
accumulated effect of past strategy adoptions. They 
displayed strong characteristics of rational thinking. 
Conclusions. In the process of learning searching expertise, 
users demonstrate reinforcement characteristics. Moreover, 
users at different cognitive levels exhibit different 
reinforcement patterns. Theoretical and practical 
implications were proposed from the perspectives of training 
programme design, adaptive information retrieval system 
design and information behaviour model development. 
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Introduction 

The present study is designed to investigate how university students learn to use 
the search functions provided by scholarly databases and adjust their searching 
strategies without instructions. The focus of this research is on learning of 
information searching skills in practice. Here information searching means 'a 
potential sub-stage in the information-seeking process' f Wilson iqqq : 258) and 







'the micro-level of behaviour employed by the searcher in interacting with 
information systems' (Wilson 2000 : 49). 

Since Belkin in the 1980s, information science has attempted to bring its 
information seeking perspective into information search (e.g., Bates 2002 : Belkin 
et ah iqq6 : Ingwersen iqq6 : Saracevic iqq6 : Sutcliffe et ah 2000 : Spink 1QQ7 : 
Spink et al. 2002a : Spink et ah 2002b : Wilson 1QQQ. Wilson 2000 : Wilson et ah 
2002k However, pertinent literature bridging learning and information searching 
sheds more lights on learning of knowledge by searching ( Colvin and Keene 
2004 : Ford et al. 2003 : Laxman 2010 : Marchionini 2006 : Puustinen and Rouet 
2009; Zhu et al. 2011k rather than learning of searching by practice. By learning 
of knowledge by searching, we mean that users acquire knowledge for sense¬ 
making or problem-solving purposes through information searching, while 
learning of searching by practice refers to that users improve the level of search 
skills through practising searches. Several longitudinal studies fChu and Law 
2007 : Vakkari 2001 : Warwick et al. 200QI examined users' experiences of 
academic information seeking and the development of their search expertise. 
However, few attempts have been made to disclose the behavioural evolution and 
cognitive dynamics during users' self-regulated learning of search skills by trial 
and error. 

In this study, it is assumed that there is an autonomous reinforcement learning 
process during academic users' information searching and that different users 
demonstrate different reinforcement patterns of learning. Specifically, 
mathematical reinforcement learning models are brought in to fit the data from 
user experiments. By model fitting and analysis, this study aims to discover the 
characteristics of users' searching behaviour, and the learning mechanisms 
controlling users' adjustments of search strategies. 

The rest of the paper proceeds as follows: related research is reviewed; then the 
research questions and assumptions are proposed, followed by the description of 
the quasi-experiment approach employed in this study, and the process of model 
estimation and validation; the results and discussion are presented and concluded 
afterwards; finally, implications and further research are discussed. 

Literature review 

Learning in information searching 

Learning of searching by practice versus learning of 
knowledge by searching 

As commented by Jansen et al. (2009), in many studies concerning information 
seeking behaviour, the learning aspect is assimilated into other frameworks, such 
as sense-making and problem-solving fBrand-Gruwel et al. 2000 : Eisenberg and 
































Berkowitz iqqo : Kuhlthau 1 QQ? : Savolainen lQQgk Most of the research linking 
information seeking or searching with learning emphasizes learning of knowledge 
by searching ( Colvin and Keene 2004 : Ford et al. 2003 : Laxman 2010 : 
Marchionini 2006 : Puustinen and Rouet 2000 : Zhu et al. 2011k rather than 
learning of searching by practice, although there are commonalities between 
these two kinds of learning process. Nevertheless, from previous studies, learning 
as a means to develop searching skills can still be found. For example, some 
studies underline users' learning and understanding of search tasks, information 
needs fCole et al. 2007 : Kelly and Fu 2007k and search strategies ( Halttunen 
2003.; He et al. 2008 : Saito and Miwa 2007k Cole et al. (2007) conducted a field 
study to examine how domain novices learned to represent the topic spaces of the 
search tasks. Kelly and Fu (2007) employed online elicitation forms to collect 
users' descriptions of the search topics. The forms were distributed to users in later 
experiments, and significantly helped them formulate better queries. Saito and 
Miwa (2007) carried out controlled experiments to evaluate the educational 
potentials of a deliberately constructed search-process feedback system in 
facilitating reflective activities for online searching. Their findings confirm that the 
performance of the participants supported by the feedback system improved 
substantially. He et al. (2008) examined the effects of two different training 
approaches, referred to as conceptual description and search practice, on users' 
learning and understanding of using a case-based reasoning retrieval system. 
Halttunen (2003.) investigated students' interpretations of information retrieval 
know-how and summarized the principles of designing constructive learning 
environments for information retrieval. 

Learning process 

Studies regarding the process of users' learning of searching expertise can be 
classified into two categories: self-regulated learning f . Jansen et al. 2000 : 

Kuhlthau 1 QQ? : Xie 2000 : Xie 2007I and instruction-assisted learning fCole et al. 
2007 : Gerjets and Hellenthal-Schorr 2008 : Halttunen 2003 : Kelly and Fu 2007 : 
Kuhlthau et al. 2007 : Saito and Miwa 2007k Besides, in the view that learning of 
searching expertise is a dynamic process, researchers ( Chu and Law 2007 : Vakkari 
2001 : Warwick et al. 200QI conducted longitudinal investigations to track the 
change of users' searching expertise over time. 

By self-regulated learning, we mean that users finish searching all by themselves, 
without guides from others or systems. Grounded in the constructivist view of 
learning, Kuhlthau fiQQgl presented a six-stage model of information search 
process: initiation, selection, exploration, formulation, collection and presentation. 
The whole process involves 'the total person incorporating thinking, feeling and 
acting in the dynamic process of learning' f Kuhlthau 1 QQ 2 : 348), in which users 
move from uncertainty to understanding. Xie 12000 : 2007I investigated how the 
interplay between plans and situations lead to users' shifts of strategies and 









































interactive intentions within an information seeking session. The twofold shifts in 
Xie's study are essentially the results of users' self-assisted reflective learning. 
However, in existing research, the learning process in information seeking is 
aimed at problem solving, rather than search skill acquiring. In response to this, 
Jansen et al. (2000 : 643) called for a learning theory, which 'may better describe 
the information searching process than more commonly used paradigms of 
decision making or problem solving'. Their research indicates that different 
learning levels relate to particular searching characteristics. The results partially 
support that searching episodes are learning events. 

In recent years, instruction-assisted learning including social learning and 
training-based learning has been much stressed and the influences of external 
intervention on users' learning extensively analysed. Kuhlthau et al. (2007) 
elaborated guided inquiry as 'a dynamic, innovative way of developing information 
literacy'. Cole et al. (2007) claimed that instructive intervention helps novices 
bridge the gap between their mental models and the thesaurus's hierarchical 
syndetic representation of the search topic. According to studies of Kelly and Fu 
(2007) and Saito and Miwa (2007), when provided with analogous information, 
such as keyword description of similar search topics and information about other 
participants' search process, participants greatly improve their search 
effectiveness. Halttunen (2003.) maintained that information retrieval instruction 
should be integrated with constructive learning. Attempting to design constructive 
learning environments, Halttunen summarized five different aspects of 
participants' interpretations of information retrieval, and examined their 
relationship with learning styles and academic backgrounds. Gerjets and 
Hellenthal-Schorr (2008) proposed a user-oriented Web training based on a 
conceptual decomposition of the sub-competencies of media literacy and the sub- 
processes of information retrieval, and a task analysis of information problems. 
Their study shows this training approach is more beneficial to develop high school 
students' declarative knowledge of the Web and facilitate their searching, than 
conventional technique-oriented trainings. 

Taking a long-term view, Chu and Law (2007) investigated twelve postgraduate 
students' growing understanding of searching skills over a one-year period. They 
collected data from surveys, interviews, students' search statements and think- 
aloud protocols. Their findings reveal that, in the beginning, students conducted 
more questionable subject searches, with little attention paid to keyword 
searching; later, as they learn more about the capabilities of keyword searching, 
they prefer keyword searching to subject searching, and at the same time they 
proceed from simple keyword searches to more complex keyword searches. 

Vakkari (2001) observed eleven master's students' information searching processes 
during a period of four months when they were preparing their research proposals. 
The research corroborates that students' exhibited searching characteristics 
(including information needs, search tactics, term choices, relevance evaluation 











and use of obtained information) which correlate highly to their problem-solving 
stage and their mental model. Based on a two-year investigation of the growth of 
information seeking skills in a group of undergraduate students, Warwick et al. 
f2QOQl found that the demands of students' undertakings act as the major factor 
leading to the progress of their information seeking; students follow the law of 
minimum effort to retain established information-seeking strategies or seek new 
methods. Whereas studies by Chu and Law (2007) and Vakkari (2001) provide 
little evidence on how users acquire the knowledge, research done by Warwick et 
al. 1 2000 1 draws a more detailed picture of users' development of searching 
expertise. 

Influencing factors 

Besides measuring the impacts of external instructions, the majority of previous 
work concerning learning behaviour in information searching highlights the 
influences on users' learning process of users' personalities (including individual 
experience, knowledge, cognitive style, learning style, and so on) f Bilal and Kirby 
2002 : Jansen et al. 2000 : Tabatabai and Shore 2005 : Tenopir et al. 2008 : 
Thatcher 2008 : Wildemuth 2004 : Zhang 2008! . task complexity (which is 
associated with users' familiarity with the search task) fJansen et al. 2000 : Kim 
2002 : Zhang 20081 or system characteristics fWilson et al. 200 q1 . 

For instance, using comparative studies, Bilal and Kirby (2002), Tabatabai and 
Shore (2005.) and Thatcher (2008) reported that users with different knowledge 
backgrounds or cognitive capacity (such as novices and experts, children and 
adults) exhibit different behavioural characteristics in information searching. 
Wildemuth 1 2004! conjectured that domain knowledge affects the adjustments of 
search tactics: insufficient domain knowledge is accompanied with awkward 
concept representations and erroneous reformulations of search patterns. Zhang 
(2008) explored the effects of mental models on undergraduate students' online 
searching. The researcher concluded from experimental studies that students' 
familiarity with the task significantly influences their ways to initiate interaction, 
query constructions, and search tactics. Recently, Jansen et al. I 200Q I examined 
the learning characteristics of users with different cognitive levels in completing 
search tasks of different complexities. Their study substantiates the differences in 
exhibited searching characteristics among users of different learning styles. 
Tenopir et al. (2008) examined the affective and cognitive dimensions of 
searching behaviour and included learning styles as an influencing factor. They 
administered 41 participants into experiments and used audio/video devices to 
capture and record their interactions with ScienceDirect . The researchers reported 
the associations between engineering graduate students' learning styles 
(converging vs. assimilating) and the characteristics of their search sessions. Kim 
(2002) confirmed that cognitive style (field dependence vs. field independence), 
search experience (novice vs. experienced searchers), and task type (known-item 




























vs. subject search tasks) are variables impacting users' search performance and 
navigational style on the Web. Wilson et al. ( 2 poo l quantified the strengths and 
weaknesses of three advanced search interfaces in scaffolding user-system 
interactions by integrating existing research models of users, needs, and 
behaviour. 

In summary, prior research has attempted to connect information searching with 
learning; however, limited efforts have been made to model the underlying process 
of users' learning in information searching. This is preliminarily examined in our 
work. 

Reinforcement learning models 

Humans share with other animals a simple way of learning, which is usually called 
reinforcement learning. This reinforcement learning seems to be biologically 
inherent. If an action leads to a disadvantageous outcome (also refers to a negative 
payoff or punishment), this action will be avoided in the future; otherwise, if an 
action leads to a favourable outcome (a positive payoff or reward), it will reoccur 
fBrenner 2006 : Sutton and Barto iqq 81 . Here, the word action can also be 
understood as strategy. 

In the spirit of reinforcement learning, a variety of reinforcement learning models 
have been established in psychology, economics and computer science to 
quantitatively analyse different learning behaviour in different contexts fBorgers 
and Sarin 2000 : Bush and Mosteller iq^^ : Cross 1 Q 72 : Erev and Roth iqq6 : Fu 
and Anderson 2006 : Izquierdo et al. 2007 : Roth and Erev IQQ5 : Shimokawa et al. 
200q1 . Among them, Bush and Mosteller's model f Bush and Mosteller IQ53I . 
Cross's model f Cross IQ73I . Borgers and Sarin's model fBorgers and Sarin 2000I 
and Roth and Erev's two models f Roth and Erev iqq^ : Erev and Roth iqq 61 can be 
regarded as the five most typical ones, and are employed to fit the experimental 
data in our study. These models are briefly compared in Table 1. More detailed 
mathematical descriptions regarding these models can be found in the Appendix . 


Model 

Mechanism 
by which 
payoffs 
affect 
strategy 
adjustments 

Measure of the 
extent to which 
payoffs affect 
strategy 
adjustments 

Basic ideas 

Bush and 

Mosteller's 

model 

Payoff of 
the last 
strategy 
adoption 

A fixed constant 

When a certain strategy leads to a positive payoff, the probability 
of this strategy being chosen again increases and the probability of 
it being avoided decreases. Otherwise, the probability of the 
strategy being further adopted decreases and the probability of it 
being avoided increases. 

Borgers 

and 

Sarin's 

model 

Difference 
between the 
actual payoff and 
the expected one 

If the actual payoff of a strategy exceeds the expectation, the 
probability of this strategy being further selected increases; if the 
payoff is smaller than the expectation, the probability of the 
strategy being further adopted decreases. 

Cross's 

model 

A monotonic 

function of the 
payoff 

The attraction of a strategy is defined as a linear function of the 
payoff, by configuring the reinforcement strength as a variable 
correlated to the payoff. 

Roth and 
Erev's 


Accumulated 
payoff from 

Decision makers choose a strategy based on their experiential 
expectations for all strategies. These expectations result from the 
































model 


adopting a 
strategy 

accumulated effect of their past strategy adoptions, not only the 
last one. 

Roth and 
Erev's 
modified 
model 

Accumulated 
effects of all 
the previous 
strategy 
adoptions 

Accumulated 
payoff from 
adopting a 
strategy (taking 
forgetting, 
subjective 
cognition and 
neighbour 
strategies into 
account) 

A forgetting parameter is incorporated into the basic model of Roth 
and Erev to measure the attenuation degree of users' experiences 
influencing their strategy selections. A transferring parameter is 
added to determine the extent of the reinforcement strength being 
transferred to the unemployed strategies. At the same time, 
different individuals make different subjective evaluations to a 
strategy even when the payoffs from applying the strategy are 
equal. 


Table 1: Comparison of the five typical reinforcement learning models 

The process of information searching is also a process of decision-making or 
action-taking f Du and Spink 2011 : Kuhlthau lQcn : Savolainen locnh Users exhibit 
similar reinforcement learning characteristics in this process. Reinforcement 
learning models can be adopted or revised to disclose the mechanisms dominating 
users' learning of searching knowledge. This is further studied in our research. 

Research questions and assumptions 

Research questions 

The focus of this study is on learning of searching by practice, instead of learning 
of knowledge by searching. It also concerns the effects of personal traits (e.g., 
information seeking experience and academic backgrounds) on users' learning of 
search strategies in information searching. However, it is not to provide evidence 
for or against these effects by qualitative or quantitative analysis of data gathered 
from experiments, questionnaires, interviews or observations. Rather, this study 
brings in several reinforcement learning models to examine the micro process of 
users' self-regulated learning of search expertise by trial and error. It aims at 
mining the mechanisms underlying users' behaviour adjustments and discovering 
their learning characteristics and cognitive dynamics during information 
searching. 

The specific research questions are as follows: 

1. How do university students learn to use correct strategies to conduct 
scholarly information searches without instructions? In other words, are 
there learning rules controlling their strategy adjustments during searching? 
If so what are the rules? 

2. What are the differences in learning mechanisms between users at different 
cognitive levels? 

Assumptions 

The research question design, experiment design, model application and 
explanation in this study are founded on the following assumptions: 

(1) In the process of self-regulated learning of searching expertise, 













users demonstrate reinforcement characteristics. 

When a user completes a search task by a certain strategy, the user may evaluate 
this process in terms of time cost, quantity of relevant results, and so on. 
Depending on this evaluation, the user will form a tendency to retain this strategy 
or reject it by switching to other strategies for next tasks. In other words, users 
adjust their behaviour by referring to their experience in database using and based 
on their knowledge about the available strategies. This process of dynamic 
alignment tallies with the core conception of reinforcement learning f Sutton and 
Barto iqq 8 1 . Figure l describes this process of strategy reformulation. 



Figure 1: Reinforcement learning mechanism in search strategy formulation 


The above process of reinforcement learning and search strategy adjustments is 
also consistent with the information search process proposed by Ellis fiQ8Ql and 
Wilson fiQQ7h in which a user first defines information needs, and then 
formulates or selects a search strategy, performs searching or browsing, obtains 
and evaluates the search results. 

(2) Users at different cognitive levels demonstrate different 
reinforcement patterns. 

It is assumed that users' personal traits have impacts on their information 
behaviour, and there are differences in the reinforcement characteristics between 
different users during their learning of searching expertise. This assumption is 
justified in the present study by introducing different reinforcement learning 
models to fit the experimental data collected from different user groups, and 
evaluating the applicability of the models to the data. 

Research design 

Overview 

A quasi-experiment approach was designed according to the requirements of data 
analysis and model inference and fitting. Two groups of undergraduates at 






















different cognitive levels participated in the experiments in January 2009. They 
were asked to execute set search tasks in a specified academic database system 
independently. The process of their strategy adjustments by trial and error was 
observed and recorded by questionnaires and a screen-tracking software. The 
gathered experimental data were quantitatively fitted by different reinforcement 
learning models. The fitness of the models to the data was checked and the best 
model to explain the learning behaviour of users in each group was chosen. By 
doing this, the dynamic learning mechanisms behind users' explicit strategy 
formulations were analysed and the differences in learning characteristics between 
different user groups were examined. 

Participants 

In the first experiment, thirteen first-year undergraduate students (freshmen) who 
had little knowledge of academic information searching were organised into our 
laboratory, while in the second experiment, thirty-four fourth-year undergraduate 
students (seniors) who did have experience of academic information seeking were 
administered together. All students had experience of using Google or Baidu (a 
well-known local search engine in China). 

It is supposed that there are discrepancies in the level of cognitive processing 
between freshmen and senior students, considering the differences in their 
information seeking experience, knowledge and capability of comprehension, 
application, analysis, synthesis and evaluation f Bloom et al. 1056) . The cognitive 
level of participants is the independent variable in this study. It is assumed to 
affect the dependent variable, i.e., users' reinforcement learning behaviour. 

Experiment settings 

All participants were required to log in the search page of CNKI . a well-known 
scholarly database system in China, and perform ten different search tasks without 
extra instructions. 

The same search tasks were assigned to all participants. These tasks were designed 
before the experiments by the researchers. The tasks relate to different subjects. A 
task form giving descriptions for each task was handed out to participants before 
they started the tasks. The descriptions include the task title and several keywords 
associated with the task topic, which removed the chance participants would 
misunderstand the task. 

For each task, the researchers had done a test search in the database system 
beforehand, and labelled all the relevant search results. These results served as 
standard ones. Once participants finished a task, the standard results were 
presented to them to check the correctness of their search performances. 

A questionnaire was devised to solicit the perceptions of a participant with regard 






to the formulated search strategy for each task. The perceptions include: 

1. The description of the search strategy, including the search function, the 
keywords, the way the keywords were input, and additional details; 

2. The participant's expectation of the strategy bringing desired results; 

3. The satisfaction of the participant with the strategy after applying it and 
comparing the results with the standard ones. 

An incentive mechanism was designed to avoid the possible insufficiency of users' 
motivation to complete the tasks: those who got better search results would be 
rewarded with delicate and attractive presents. 

Besides, participants were told by the researchers that for each task: 

1. All keywords that represent the task topic must occur in each title of the 
search results. To this end, participants must learn to use multiple search 
boxes and logical AND connector, so that they could input each keyword in 
each box and formulate a correct query to fulfil the task. 

2. Search results totally consistent with the standard results would be 
considered satisfactory, and presents would be awarded to those who reached 
the satisfactory results. 

The participants' interactions with the database system were recorded by a screen¬ 
tracking software to provide extra information for data analysis. 

The above experimental design provides a quasi-experiment approach. The 
variables such as experimental environments, search tasks, information need 
understanding, and external stimulations were controlled to be consistent between 
each participant. As for information need understanding, it was not necessary for 
participants to figure out what keywords should be used for each task, since 
standard keywords were offered in the task form. With respect to external 
stimulations, there was no instruction supplied to participants, and the same 
incentive mechanism was applied to each of them. 

By controlling the above interventions, the effects of factors other than 
participants' cognitive levels were excluded from the experiments to the maximum 
extent practicable, and therefore the process of participants' strategy adjustments 
in performing the search tasks could be more accurately observed. 

Search strategies 

In relation to search strategies, Bates ( TQ7Q1 defined twenty-nine tactics in four 
categories: monitoring, file structure, search formulation and term. In Bates's 
model, search formulation tactics are the moves that searchers make to design or 
redesign search formulation, while term tactics are the actions searchers take in 
selecting and revising terms within the search formulation. Likewise, Belkin et al. 
fiQQ6f proposed a classification scheme of search strategies. In Belkin's taxonomy, 




strategies encompass term strategies, database strategies, interaction strategies, 
and search strategies. Search strategies or tactics in these studies are 
conceptualised to describe the possible actions a user can take from initiating a 
search task to concluding it. 

In the present research, a search strategy refers to the action that a participant 
takes to carry out a search task, by selecting one of the search functions offered by 
the search system and formulating a search query. The optional search functions 
include the basic search, the advanced search and the expert search. To facilitate 
model inference and fitting, the search strategies that a participant could apply to 
construct a query were categorised into three types: 

1. The first type, the simple-search strategy, refers to when a participant 
inputs all the keywords in a single textbox either in the basic search page or 
the advanced page. Since in the experiment system, those input keywords 
without any Boolean operator are processed according to default 'OR' logic, 
this strategy may incur much irrelevant feedback. In other words, the search 
results may be of high recall but of low accuracy. 

2. The second type, the unsuccessful multiple-textbox strategy, refers to when 
a participant selects the advanced search, inputs keywords in multiple 
textboxes as per one word in one box, but does not specify any Boolean 
operator to logically connect the keywords. In the same way to the simple- 
search strategy, the system processes the keywords under 'OR' logic, and the 
user may not get the exact feedback up to the standard results. However, 
from the perspective of learning, when participants apply this strategy, they 
somewhat get the conception of the advanced search, which is supposed 
more effective than the simple search. 

3. The third type, the logic-AND-search strategy, is the target strategy for the 
experiments in our study. When applying this strategy, a participant selects 
the advanced search, inputs keywords in multiple textboxes with one word in 
one box, and uses 'AND' operators to organise the keywords into a 
meaningful query. If all the required keywords associated with a search task 
are input, this strategy is expected to lead to correct search results. 

From the collected experimental data, it was found that no student ever made 
attempts at the expert search. 

Procedure 

Given a search task, a participant was asked to carry out the following process: 

1. Understand the task by examining the required keywords listed in the task 
form; 

2. Figure out a strategy, including the search function and the keyword 
inputting scheme; 

3. Depict the search strategy on the questionnaire; 


4. Write down an expectation score (i.e., the participant's confidence of the 
strategy bringing desired results) on the questionnaire; 

5. Execute the search (namely apply the formulated strategy); 

6. Evaluate the search results by comparing them with the standard results 
presented by the organisers; 

7. Write down a satisfaction score on the questionnaire; 

8. Continue the next search task until all tasks are completed. 

Each participant's learning process was observed by tracking their strategy 
adjustments in executing all the search tasks in sequence. 

Data analysis 

For each of the two student groups, the collected experimental data were divided 
into two parts: (1) The first 70% of the data (associated with the first seven search 
tasks) were used to infer the parameters of each model; (2) The remaining 30% 
(regarding the last three tasks) were fitted by the estimated models. The model 
best fitting the data was used to explain the learning behaviour of the users in the 
corresponding group. 

Estimation of model parameters 

The maximum likelihood method was used to estimate the parameters of each 
model with regard to the experiment data of each group. The likelihood function 
for the g-th group and k -th model is defined as: 

LL g,k (©) = nffi (nr= 1 P/ ik (0) (Equation 1) 

where 0 denotes the parameters, T=y is the number of training tasks, and N g is 

pi f 

the number of participants in group g. ^ stands for the attraction of strategy j 
adopted by user i for task t, and is computed under the updating rules of model k. 

Table 2 details the parameter estimates. 


Student 

Group 

Bush and 
Mosteller's model 

Borgers and 
Sarin's model 

Cross's 

model 

Roth and Erev's 
modified model 

Freshmen 

a BM = 0.2; /3 SM =0.1 

/3 SS =0.100 

a CR = 0.1; 
p CR = 0.1 

cp= 0; £=0.3428 

Seniors 

a BM = 0.1; /3 sm =0.1 

/3 SS =0.258 

a CR = 0.1; 
/3 cr =oa 

<p=0.4; £ = 0.2407 


Table 2: Parameters estimates 

Note there is no parameter in Roth and Erev's basic model. The parameter X m ( n in 

Roth and Erev's modified model can be directly derived from questionnaire data. 

It is the minimum expectation per participant for all strategies. 











Model fitting and verification 

The final models were obtained by replacing the parameters with the estimates. 

The models were then applied to the experimental data associated with the last 
three search tasks: given a participant and a task, the probabilities of the 
participant choosing different search strategies were computed, and the strategy 
with the maximum probability was ticked as the predicted strategy. This process is 
referred to as model fitting, or in this study, strategy simulation. 

The effectiveness of model fitting was evaluated by measuring the difference 
between the simulated strategies derived from each model and the actual strategies 
that participants took. This difference was gauged by the mean squared distance in 
the present study. The mean squared distance for the i-th participant and the k -th 
model is computed as follows: 

MSD iM = (( p /, k (0 " *(J> 4 ( 0 )) 2 /( 0.3 • T • m)) (Equation 2) 

where T=io is the total number of search tasks, m denotes the size of strategy set, 

pi /■*'\ 

■ is the probability of participant i taking strategy / to fulfil task t predicted by 
model k, dj(f) denotes the actual strategy chosen by participant i in period t, and 

is a contingent decision function whose value is o when or l when 

Table 3 reports the mean and standard deviation of the mean squared distances 
with regard to each student group and each model. 


Mean and standard deviation of the 
mean squared distances per student 
group per model 

Bush and 
Mosteller's 
model 

Cross's 

model 

Borgers 

and 

Sarin's 

model 

Roth and 
Erev's 
model 

Roth and 
Erev's 
modified 
model 

Freshmen students 

Mean 

0.05278 

0.004631 

0.022475 

0.024239 

0.016259 

Standard deviation 

0.013887 

0.001122 

0.010654 

0.006058 

0.002018 

Senior students 

Mean 

0.017591 

0.009455 

0.034181 

0.006506 

0.006587 

Standard deviation 

0.006063 

0.019643 

0.150697 

0.002358 

0.001466 


Table 3: Results of model verification 

For each group of students, the model with the smallest mean and standard 
deviation was chosen as the optimal model to fit their behaviour data. 
Consequently, based on data in Table 3, for freshmen, Cross's model fits best, 
while for seniors, Roth and Erev's modified model is the best. 

Results 

Freshmen's learning: Cross's model 


From the data in Table 3. it can be inferred that freshmen's search strategy 
adjustments comply more with Cross's model. 
















(l) Freshmen showed insistence and inertia towards earlier strategy 
preferences. 

According to the updating rules of strategy attraction in Cross's model (Equations 
8 and a, see Appendix! freshmen (first year students) are more inclined to 
continue the search strategies employed in their last task. 

Table 4 presents the statistics of users' behaviour obtained from the experiment 
data. It can be seen that freshmen were more likely to choose the simple search as 
the initial strategy and input keywords in a single search box. They did so based on 
their former experience of general search engine using. 


Indicators 

Freshmen 

Students 

Senior 

Students 

Percentage of users with the initial strategy being the 
simple search 

92.31 

82.35 

Average tasks after which users switched to the advanced 
search page 

5.62 

4.44 

Average tasks after which users started to use the logic- 
AND-search strategy 

7.61 

5.74 


Table 4: Statistics of users' learning behaviour in information 

searching 

The average tasks after which users first switched to the advanced search page and 
the average tasks after which users started the logic-AND-search are also reported 
in Table 4. The results tell that freshmen took more time to leave the simple 
search, learn to use new search functions and take new strategies. Their behaviour 
followed a Markov process, and they were somewhat insistent to their earlier 
strategy preferences. 

(2) Freshmen could finally give up experiential preferences and 
comprehend new strategies by learning. 

The parameter estimates of Cross's model for freshmen are: a CR = 0.1; / 3 CR = 0.1 (see 
Table 2! It implies that freshmen held insistence and inertia to the established 
strategies, but the extent was not so remarkable. As shown in Table 4 . averagely 
after 6 to 8 tasks, freshmen gave up their preference of the simple search. They 
learned to use the advanced search and took the logic-AND-search strategy 
through trial and error. Most freshmen finally found out and used the logic-AND- 
search strategy, which was more possible to bring search results consistent with 
the standard ones. 

Seniors' learning: Roth and Erev's modified model 

The data in Table 3 indicate that for seniors (final year students), Roth and Erev's 
modified model is more ideal to fit their learning behaviour. They depended on 
their past experiences to align search strategies. At the same time, they developed 
strategies through rational thinking. 














(1) Seniors were ready to make comprehensive decisions based on 
recent experiences. 

The estimate of the forgetting parameter cp in Roth and Erev's modified model for 
seniors is 0.4 (see Table 2) . According to Equation 12^15, this means, to a non- 
negligible extent, seniors would like to make comprehensive decisions based on 
their recent experiences. Basically, the more recently a search experience happens, 
the greater impact it has on the current decision making. 

(2) Seniors showed strong subjectivity when evaluating the feedback 
from adopting a certain strategy. 

According to Equation 14 (see Appendix! . R(ji(f))=ji{t)-X m i n , when making 

decisions, seniors demonstrated strong cognitive subjectivity. Different seniors 
might make different evaluations towards equal strategy payoffs. 

Figures 2 and 3 depict the perceptions of the students who adopted the logic-AND- 
search strategy. Figure 2 portrays the average expectation per task of the freshmen 
and the seniors. Figure 3 illustrates the changes of their satisfactions. It can be 
inferred that, the freshmen held high expectations before applying the logic-AND- 
search strategy, and consistently scored high satisfactions with the feedback. In 
contrast, the seniors' expectations and satisfactions in different tasks were quite 
unsteady, and were almost lower than those of the freshmen. 



Figure 2: Average expectation per task of those students who adopted the logic-AND-search strategy 


























Figure 3: Average satisfaction per task of those students who adopted the logic-AND-search strategy 

(3) Seniors paid attention to neighbour strategies. 

The estimate of the transferring parameter e in Roth and Erev's modified model 
for seniors is 0.2407 (see Table 2) . According to Equation 12-13 . this means when 
adjusting their strategy, seniors were not completely affected by the information of 
the strategy adopted in the last search, but also concerned about the unemployed 
strategies. The strength of the unemployed strategies influencing their current 
strategy selection is 24.07%. In other words, seniors paid attention to 
neighbouring strategies. 

Figure 4 describes the percentages of students who adopted the unsuccessful 
multiple-textbox strategy in each task. Figure 5 presents the percentages of 
students who correctly tried the logic-AND-search in each task. Interestingly, more 
seniors used logic-AND-search in the fourth task than in the fifth task. 
Correspondingly, fewer seniors took the unsuccessful multiple-textbox strategy in 
the fourth task than in the fifth task. That means some of the seniors who chose 
the correct strategy in one task returned to incorrect strategies in later tasks. This 
kind of phenomenon occurs several times (see Figures 4 and 5). After tracing back 
to the screen videos, the researchers found that a few seniors who had successfully 
employed the logic-AND-search started to explore other search options such as 
document type, year range, and so on. These options probably confused them and 
made them fail to use logic AND operators in subsequent tasks. Undoubtedly, 
those seniors displayed strong characteristics of rational thinking. This point is 
exactly what Roth and Erev's models try to reveal. 























□ Freshmen students 
■ Senior students 



Figure 4: Percentages of students who followed the unsuccessful multiple textbox strategy 



° Freshmen students 
■ Senior students 


Figure 5: Percentages of students who adopted the logic-AND-search strategy 

Summary 

The above findings give substantial answers to the research questions, and confirm 
the theoretical assumptions. 

• Question: How do users learn to use correct strategies to conduct scholarly 
information searches without instructions? In other words, are there learning 
rules controlling their strategy adjustments? If so what are the rules? 

Answer: In a scheme of things, users demonstrated reinforcement learning 
characteristics. The strategies that brought success in their earlier 
experiences would be repeated with a higher probability. Through learning 
by trial and error, both freshmen and seniors could finally comprehend new 
search strategies. Answer to this question justifies the first assumption of this 
study. 

• Question: What are the differences in learning mechanisms between users at 
different cognitive levels? 

Answer: Users at different cognitive levels demonstrated different 































































































































reinforcement patterns. The learning behaviour of freshmen showed 
remarkable Markov properties. Their strategy selection was determined by 
the feedback obtained in the last search activity. Cross's model better 
explains their learning mechanisms. For seniors, their strategy selection 
depended on the accumulated effect of past strategy adoptions. They 
displayed strong characteristics of rational thinking. Roth and Erev's 
modified model better describes their learning behaviour. Answer to this 
question substantiates the second research assumption. 

Discussion 

Characteristics of reinforcement learning 

It was found that most undergraduates preferred to repeat the strategies that bring 
success in their earlier experiences. This is highly consistent with the findings of 
Warwick et al. ( 200Q : 2402) that undergraduate students 

used their growing expertise to justify a conservative information 
strategy, retaining established strategies as far as possible and 
completing tasks with minimum information-seeking effort. 

Specifically, according to this study, in the first task, 85% of undergraduates 
(92.3% of freshmen and 82.4% of seniors, See Table 4 1 chose the simple search as 
the initial strategy. It was supposed that the studied students were influenced by 
their former experience of general search engine using fDu and Evans 2011 : Fast 
and Campbell 2004 : George et al. 2006 : Haglund and Olsson 2008 : Malliari et al. 
2011k 

There were differences in the reinforcement learning process between freshmen 
and seniors, as previously claimed. Freshmen can be considered to be novices with 
little perception of scholarly information seeking, while seniors are users with 
more expertise. From this point of view, the differences in the reinforcement 
learning patterns between freshmen and seniors can be expanded by findings of 
Warwick et al. ( 200Q : 2413), as follows: 

Reflection on the learning theories of Kolb (1984) ... learners will 
often resist acquiring new skills because rejecting existing skill 
causes negative emotions (e.g., confusion, anger, upset). Existing 
skill is guarded zealously and adapted repeatedly until it finally 
fails ... Expert searchers therefore are not only differentiated by 
their existing skills but also potentially by their attitude to 
acquiring new ones. 

Warwick et al. grounded the above point by referring to Kolb's (1984) learning 
theories, which are congruous with the assumptions of this study. 













Effectiveness of reinforcement learning 


Consider the average number of tasks it took participants to change from the 
simple search to the advanced search and start the logic-AND-search (See Table 
4). It can be concluded that the learning effectiveness of academic users through 
self-regulated trial and error was not so satisfying. Especially, freshmen spent 
more time to learn the correct search strategy; the average tasks it took them to 
use the logic-AND-search were 7.61 out of 10. This highlights the necessity of 
external instructions to improve the effectiveness of user's learning of information 
seeking, especially for novices. Although this declaration should be further 
justified, the researchers are still positive with it by referring to other studies 
f Colvin and Keene 2004 : Halttunen and Jarvelin 2005 : Ren 2000T 

Besides, seniors learned the correct search strategy more quickly than freshmen, as 
described in Table 4 . This is in agreement with the studies of Chen f 200 q1 . Eshet- 
Alkalai and Chajut (2004), Hsieh-Yee f iQQ'j l. Korobili et al. (2011), and Thatcher 
(2008). Specifically, this study to some extent confirmed the findings of the recent 
work done by Korobili et al. ( 2011) , that there are statistical significant 
relationships between users' experience in databases or e-journals and the 
variables: more than one keyword, Boolean operators as search techniques, 
change strategy, different keywords as techniques to modify the initial strategy, 
and so on. 

Conclusions and implications 

The study observed the strategy adjustments of thirteen first-year undergraduates 
and thirty-four fourth-year undergraduates in carrying out ten search tasks in a 
specified database system independently. It was assumed that there are 
discrepancies in the level of cognitive processing between the two groups of users. 
The impacts of cognitive levels on learning of searching skills were examined by 
excluding the effects of other factors through quasi-experimental settings. When 
executing a search task, a user was asked to write down: (1) the description of the 
formulated search strategy; (2) the expectation of the strategy bringing desired 
results; and (3) the satisfaction with the strategy. The dynamics of search 
strategies, expectations and satisfactions of each user across different tasks were 
simulated through five reinforcement learning models. These dynamics were 
supposed to be the outcomes of participants' learning and reflection. 

It is found that undergraduates prefer to retain established strategies. It takes 
them a long time to change from the simple search to the advanced search and 
learn to use the most effective strategy. Generally, in the process of searching 
expertise learning, users demonstrate reinforcement characteristics. If a search 
strategy leads to satisfactory results, this strategy will be more likely to be repeated 
with high expectation later; if a strategy leads to unsatisfactory results, it will be 
more likely to be avoided afterwards. Specifically, users at different cognitive levels 













demonstrate different reinforcement patterns. Freshmen's strategy selection is 
always made according to the feedback obtained in the last search activity, 
whereas seniors rely on their search experiences and rational thinking to make 
comprehensive decisions. 

Through observing and quantitatively simulating the micro process of academic 
users' learning of searching expertise, the current research enhances our 
understanding of users' experience of scholarly information seeking. Besides, based 
on the research outcomes and discussion, implications can be proposed from the 
perspectives of training programme design, adaptive information retrieval system 
design and theoretical development. 

As formerly discussed, learning through self-regulated trials is not the most 
effective and economic way for academic users to develop searching expertise. 
Extra instructions are needed to improve their learning performance. Instructions 
can be imparted through training curriculums offered by librarians, as well as 
online learning or help features incorporated into information retrieval systems. 
Rather than just a 'list of skills' of information literacy fMavbee 2006I . the 
instructions should be tailored to the learning patterns of different users. This 
deserves further investigation by librarians. 

By monitoring users' searching behaviour and identifying users' learning 
characteristics, information retrieval systems can offer personalised supports to 
suit the users and their search tasks, and assist them to complete the tasks, as 
suggested by Li and Belkin (2008), Stelmaszewska et al. (2005.) and Xie and Cool 
120001 . and technically practiced by de la Chica et al. (2008), Frias-Martinez et al. 
12007 : 2008), Hurst et al. (2007), Jansen I2005I . Stelmaszewska et al. I 2005I 
and Tsuji and Yamamoto (2001). This kind of adaptive feature is expected to 
facilitate users' learning of searching expertise and improve the effectiveness of 
their interactions with the search systems. The present research provides 
understanding of observational variables (e.g., initial search strategy, strategy 
adjustments, behavioural pathway, combination of Boolean operators, and so on) 
for automatically identifying users' learning characteristics in the development of 
such adaptive systems. 

Due to the small sample size, the findings reported in this paper are considered to 
be exploratory and preliminary. Further efforts can be dedicated to develop a 
comprehensive quantitative research framework. This research framework 
synthesises learning theories and information-searching paradigms, as partly 
described by Figure 1 . It is expected to 'better describe the information searching 
process than more commonly used paradigms of decision making or problem 
solving' Uansen et al. 2000 : 643). According to Kuhlthau Iiqqj : 342), the whole 
information search process 'incorporates three realms of human experience: the 
affective (feelings), the cognitive (thoughts) and the physical (actions)'. The 
complexities of affective, cognitive and physical interactions within this process 
















require deliberate design of learning parameters and reinforcement adjustment 
functions. Besides, the effects of contextual elements including instructional 
variables (e.g., search tips, anchored helps, graphic or video demos, result faceting, 
clustering or visualisation, and so forth) on the performance of users' learning and 
information searching should be included to establish a more meaningful learning 
model. 
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Appendix: Reinforcement learning models 

The basic ideas of the models listed in Table l are further explained as follows: 

(1) Bush and Mosteller's model 

In Bush and Mosteller's model fBush and Mosteller IQ53L a probability variable 
P(i) is used to define the attraction of a strategy to a certain user (denoted as u). 
Let d(£) denote the strategy which is chosen by user u in period t, and Ji(t) stand 
for the reward or punishment fed back to the user in period t. A nonnegative Ji(t) 
means the user gets a reward, otherwise a punishment. Suppose in period t, user u 
chooses thej-th strategy from the strategy set, i.e.j=d(t). Then for u, the 
attraction of strategy j is updated under the following rules: 


P(J, t) + a BM ■ (l - j = d(t)An(t) > 0 

P(j, t) - P BM ■ P(j, t) j = d(t)An(t) < 0 


(Equation 3) 


For each strategy k other than j (namely those unemployed strategies), the 
attraction value is updated according to the following rules: 

on fP(M) ~ a BM ■ P(h,t) k =£ d(t)A7r(t) > 0 

P(k,t+ 1) = | p(([ t) + 0SM . (l _ k * d(t) A„(t) < o ( E< ! uatlon 4 ) 

In the above adjusting rules, a BM and j 6 BM are two parameters to be estimated. 
a BM [o,i] is the weight factor assigned to a nonnegative payoff, while fi BM [o,i] 
the weight factor to a negative payoff. A smaller a BM means that a nonnegative 

payoff plays a slighter part in the strategy selection, while a smaller P BM means 
that a negative payoff plays a minor role in the strategy selection. 

More intuitively, the learning rules that Bush and Mosteller's model describes can 
be interpreted as: when a certain strategy leads to a positive payoff, the probability 
of this strategy being chosen again increases and the probability of it being avoided 
decreases; otherwise, the probability of the strategy being further adopted 
decreases and the probability of it being rejected increases. 

(2) Borgers and Sarin's model 

Compared to Bush and Mosteller's model, Borgers and Sarin's model f Borgers and 
Sarin 2000] details the information for evaluating the payoff of a strategy 
adoption. It assumes that the evaluation of a strategy does not directly rely on the 
absolute value of the actual payoff, but on the difference between the actual payoff 
and the expected one. Let A(t) [o,i] denote the payoff expectation of a user before 
employing a strategy in period t and A(i) be the initial expectation for the user 







before decision-making. 

If jr(t)>A(t), the attraction value of each strategy after period t is updated by: 


P(i t + T> = \ P(J ’ + ““ ' t 1 “ P(J ’*)) j = 

{P(j,t)-a BS j*d{t) 

Otherwise, the attraction values are updated as follows: 

[P(J, t) - a BS • P(J, t ) j = d(t) 


P(J,t+ 1 ) = 


P(J, t ) + a BS • (l - P(J, t)) j * d(t) 


The payoff expectation is updated as follows: 

A{t +!) = (!- P BS )-A(t) + / 3 BS ■ 7r(t) 


(Equation 5) 


(Equation 6) 


(Equation 7) 


The parameter or 85 is regarded as the reinforcement strength, whose value is the 
absolute difference between the actual payoff and the expected one, i.e., 

a BS = | |. The parameter fi BS is set fixed, which stands for the adjustment 

speed of payoff expectation. The bigger fi BS is, the more greatly the current payoff 
influences the further strategy selection. 

Similarly, Borgers and Sarin's model can be summarised as: if the actual payoff 
exceeds the expectation of an individual after a strategy is settled, then the 
probability of this strategy being further selected increases. On the contrary, if the 
actual payoff is smaller than the expectation, the probability of the strategy being 
adopted in future decreases. The expected payoff changes dynamically according to 
the actual payoff of the previous strategy adoption. 

(3) Cross's model 

As a modification to Bush and Mosteller's model, Cross's model f Cross IQ73I is one 
of the most acknowledged reinforcement learning models. 

Let R(jr(t)) be the reinforcement strength, which is a monotonic function of the 
payoff The attraction value of each strategy after period t is updated by: 


P(j, t) +R (7r(t)) * (l - P(J, 0) j = d(t) 
P(j, t) — R■ P(J,t ) j * d(t ) 


(Equation 8) 


f?( 7 r(t)) = a CR - 7 r(t) + fi CR 


(Equation 9) 


In the above rules, a CR [0,1] and / 3 CR [0,1] are two parameters that control the 
updating mechanism of the attraction of each strategy. 



In Cross's model, the attraction of a strategy is defined as a linear function of the 
payoff by configuring the reinforcement strength as a variable correlated to the 
payoff, whereas in Bush and Mosteller's model, the reinforcement strength factors, 

a BM and p BM , are fixed and independent to payoffs. 

(4) Roth and Erev's model 

Both Cross's model and Borgers and Sarin's model are essentially modifications of 
Bush and Mosteller's model. All these models place emphasis on the Markov 
characteristics of players' strategy selection. In other words, when making a 
decision, an individual prefers to choose a strategy in terms of the payoff gained 
from the last strategy adoption. In contrast, Roth and Erev's models fRoth and 
Erev IQQ5I underline users' prior experience. That is to say, decision makers select 
a strategy based on their experiential expectations for all strategies. These 
expectations result from the accumulated effect of their past strategy adoptions, 
not only the last one. 

In Roth and Erev's model, the attraction value of each strategy after period t is 
updated under the following linear rules: 



(Equation 10 ) 


P k (t+l) = A k (t + l)/S^(t + l) 


(Equation 11 ) 


Here, Ap{t) is the accumulated payoff from adopting the k -th strategy before and 
in period t. 

(5) Roth and Erev's modified model 

In Roth and Erev's modified model fErev and Roth iqq61 . the attraction values of 
the strategies after period t are updated: 

A k (t + 1 ) = (1 — <p) • A k (t) + Ej ( k,R(n(t ))) (Equation 12 ) 



(Equation 13 ) 


P(7r(t)) = 7r(t) - X min 


(Equation 14 ) 


P k (t+l) = 4 k (t + l)/Z i A 1 (t + l) 


(Equation 15 ) 


where <p is a forgetting parameter measuring the attenuation degree of users' 
experiences influencing their strategy selection, and X m [ n is the minimum 

expectation of a user for all the strategies. Through cp and X , different users 





min 


may make different subjective evaluations to a strategy even when the payoffs 
from applying the strategy are equal. Ej(k,R(jr(t))) is a function controlling how 

the payoff Ji(t) from implementing strategy / updates the reinforcement strength 
A^t+i), and e is a transferring parameter that determines the extent of the 

reinforcement strength transferring to the unemployed strategies. 
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